Understanding generalization error of SGD in nonconvex optimization

نویسندگان

چکیده

The success of deep learning has led to a rising interest in the generalization property stochastic gradient descent (SGD) method, and stability is one popular approach study it. Existing bounds based on do not incorporate interplay between optimization SGD underlying data distribution, hence cannot even capture effect randomized labels performance. In this paper, we establish error for by characterizing corresponding terms on-average variance gradients. Such characterizations lead improved experimentally explain random We also regularized risk minimization problem with strongly convex regularizers, obtain proximal SGD.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied nonconvex loss functions, but only considered the generalization error of the SGD in expectation. In this paper, we establish various generalization error bounds...

متن کامل

Theory of Deep Learning III: Generalization Properties of SGD

In Theory III we characterize with a mix of theory and experiments the consistency and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A present perceived puzzle is that deep networks show good predicitve performance when overparametrization relative to the number of training data suggests overfitting. We describe an exp...

متن کامل

Nonconvex Optimization Is Combinatorial Optimization

Difficult nonconvex optimization problems contain a combinatorial number of local optima, making them extremely challenging for modern solvers. We present a novel nonconvex optimization algorithm that explicitly finds and exploits local structure in the objective function in order to decompose it into subproblems, exponentially reducing the size of the search space. Our algorithm’s use of decom...

متن کامل

Improving Generalization Performance by Switching from Adam to SGD

Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic gradient descent (SGD). These methods tend to perform well in the initial portion of training but are outperformed by SGD at later stages of training. We investigate a hybrid strategy that begins training with an adaptive method and switc...

متن کامل

Nonconvex Robust Optimization

We propose a novel robust optimization technique, which is applicable to nonconvex and simulation-based problems. Robust optimization finds decisions with the best worst-case performance under uncertainty. If constraints are present, decisions should also be feasible under perturbations. In the real-world, many problems are nonconvex and involve computer-based simulations. In these applications...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2021

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-021-06056-w